LitLin 18_4 423-447 fqh009 FIN

نویسنده

  • Ross Clement
چکیده

Large, real world, data sets have been investigated in the context of Authorship Attribution of real world documents. Ngram measures can be used to accurately assign authorship for long documents such as novels. A number of 5 (authors) 5 (movies) arrays of movie reviews were acquired from the Internet Movie Database. Both ngram and naive Bayes classifiers were used to classify along both the authorship and topic (movie) axes. Both approaches yielded similar results, and authorship was as accurately detected, or more accurately detected, than topic. Part of speech tagging and function-word lists were used to investigate the influence of structure on classification tasks on documents with meaning removed but grammatical structure intact. LitLin 18_4 423-447 fqh009 FIN 28/1/04 8:02 am Page 423

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LitLin 18_4 361-378 fqh002 FIN

This paper presents the newly released Lancaster Corpus of Mandarin Chinese (LCMC), a Chinese match for the FLOB and Frown corpora of British and American English. We first discuss the major decisions we took when building the corpus. These relate to sampling, text collection, mark-up, and annotation. Following from this we use the corpus to study aspect marking in Chinese and British/American ...

متن کامل

LitLin 19_4 453-475 fqh034 FIN

Delta, a simple measure of the difference between two texts, has been proposed by John F. Burrows as a tool in authorship attribution problems, particularly in large ‘open’ problems in which conventional methods of attribution are not able to limit the claimants effectively. This paper tests Delta’s effectiveness and accuracy, and shows that it works nearly as well on prose as it does on poetry...

متن کامل

Monitoring Winter and Summer Abundance of Cetaceans in the Pelagos Sanctuary (Northwestern Mediterranean Sea) Through Aerial Surveys

Systematic long-term monitoring of abundance is essential to inform conservation measures and evaluate their effectiveness. To instigate such work in the Pelagos Sanctuary in the Mediterranean, two aerial surveys were conducted in winter and summer 2009. A total of 467 (131 in winter, 336 in summer) sightings of 7 species was made. Sample sizes were sufficient to estimate abundance of fin whale...

متن کامل

Reactor for Producing Large Particles of Materials from Gases

3,371,997 3/1968 Jordan et al ........................ 423/450 4,013,420 3/1977 Cheng ................................. 422/156 4,084,024 4/1978 Schumacher ....................... 423/350 4,154,870 5/1979 Wakefield ....................... 423/350 X 4,241,022 12/1980 Kraus et al. ......................... 422/156 4,292,344 7/1981 McHale .......................... 423/349 X 4,314,525 2/1982 Hsu e...

متن کامل

MicroRNA-423 promotes cell growth and regulates G(1)/S transition by targeting p21Cip1/Waf1 in hepatocellular carcinoma.

MicroRNAs (miRNAs) are small non-coding RNA molecules that are often located in genomic breakpoint regions and can act as oncogenes or tumor suppressor genes in human cancer. Our previous study showed that microRNA-423 (miR-423), which localized to the frequently amplified region of chromosome 17q11, was upregulated in hepatocellular carcinoma (HCC). However, the potential functions and exact m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004